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Abstract 

ON . An operational perspective is used to understand the relationship between source and channel coding. This is based on a 

direct reduction of one problem to another that uses random coding (and hence common randomness) but unlike all prior work, 
does not involve any functional computations, in particular, no mutual-information computations. This result is then used to prove 
a universal source-channel separation theorem in the rate-distortion context where universality is in the sense of a compound 
^ . "general channel." 
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I. Introduction 

1—1 ■ The essential duality between source and channel coding has been recognized since Shannon [1] and has attracted significant 
,— 1 1 attention recently as well (e.g. [2], [3], [4]). This paper addresses a conceptual issue: what is the core relationship between 
f-H source and channel coding and to what extent do we need mutual-information computations to understand it? 
HH ' 

^ ' Recall that classically, mutual information plays a critical role. After all, the traditional separation theorem (separate source and 
O . channel codes result in no loss in first-ordetQ optimality when delay is not an issue.) relies crucially on the mutual-information 

characterization of both channel capacity and the rate-distortion function to prove the converse direction: that we can do no 
t—( , better. Even the more general framework of [2] builds upon the information-spectrum approach of [6] that extends mutual- 
J> ' information ideas to general channels by looking at the entire distribution of an information-random-variable instead of just 

the expectation. 

00 ■ Recently, a "direct" proof of the converse direction of the separation theorem was introduced by us in [7]. The key idea 
C*") j was to treat a combined joint-source-channel code as a non-causal arbitrarily-varying channel (AVC) with a particularly weak 
t-H ■ guarantee: as long as the input to it is drawn like the source in question, it will with high probability return an output within 
t-H ' a specified distortion D. For such a channel, a random-coding argument revealed that reliable communication is possible at a 
rate given by the rate-distortion function of the source in question evaluated at D. The proof in [7], however, relied crucially 
on the mutual-information characterization of the rate-distortion function. 
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• i— i , Conceptually, there are two distinct directions that one can explore from [7]. Lomnitz and Feder in [8] essentially emphasize 
■ the mutual-information aspects for a core result that avoids the need for an a priori distortion-guarantee and they then use 
feedback to translate this core into a meaningful interpretation concerning "communication over individual channels." This is 
in the spirit of individual-sequence results as distinct from the AVC perspective taken in [7]. The contribution of this work is to 
move in the complementary direction. After introducing notation and definitions in Section [II] we give a new operational proof 
in Section [III] that does not use mutual- information computations in any way. It illuminates the operational connections and 
technical parallels between the problems of reliable communication at a particular rate and lossy-communication of a source 
to within a target distortion, in effect providing a direct "problem reduction" in the style that theoretical computer scientists 
use. It shows that the rate-distortion function of X gives the universal capacity of the compound set of general channels that 
communicate i.i.d. X sources to within a distortion D (see Theorem B.ll foi' a precise statement). This naturally gives rise to 
a universal source-channel separation theorem in Section [7_V| 



II. Notation and Definitions 

Sets, random variables, and distortion measure: Many symbols will have an interpretation for both rate-distortion source 
coding and channel coding problems. X = {1,2, . . . ,\X\} —> finite set should be thought of as the channel input alphabet or the 
alphabet of the source that needs to be source-coded, y — {1,2, . . . ,\y\} should similarly be thought of as the channel output 
alphabet or the reconstruction alphabet of the source. Let X be a random variable on X. px will denote the corresponding 

'Csizar established in [5] that strict separation does result in a significant loss in the error-exponent. Separation also breaks down even in a first-order sense 
for multiterminal problems. 
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probability distribution, d : X x y 
x 6 X is reconstructed as y G y. 



1Z is a non-negative real-valued function that represents the distortion incurred when 



Notation: A superscript n denotes a variable whose block length is n. For example, Y n will denote a random-variable on y n . 
Method of Types: We follow the notation of Csiszar and Korner [9]. 

Channel model: A channel is a sequence of transition-probability matrices and will be denoted by < c n >J°. Its operation 
should be thought of as follows for block-length n: channel input space is X n , channel output space is y n , and the channel acts 
as c n : c™ y is the probability that the channel output is y G y n when channel input is x G X n . No causality, memorylessness, 
or nestedness assumptions are assumed on < c n >f. This channel model is the same as that of Verdu and Han in [6]. 

; -00 



\X G X r ' 



c" 



y G y n , with probability c™ 



-'-<j 



'n ■ 
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Definition 2.1 (Cx,d)-' Consider a channel < c" >j > °. If the input to the channel is i.i.d. X source X n , the channel output is a 
(not necessarily i.i.d.) random variable Y n on y n . A channel is said to belong to Cx,d if, under the joint distribution px n Y n 
on the input-output space, 



Pr 
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d(X n (i),Y n (i)) > D 



as n — > oo 



(1) 




The i.i.d. X sequence Xi here is just a tool in the definition of the compound channel set Cx.d- It does not mean that one 
is necessarily trying to communicate solely i.i.d. X sources over the channel using uncoded transmission. Intuitively, one can 
think of a channel G Cx,d as follows: most px -typical sequences of length n are usually distorted to within a distortion nD. 
Channels G Cx.D will be called channels that directly communicate an i.i.d. X source to within a distortion level D. 

The process of communication: Block codes will be used for communication with block length n. The channel input space 
X n , is the cartesian product of X, n times. X n — {x±,X2, ■ ■ ■ , %\x\" }■ The channel output space y n , is the cartesian product 
of y n , n times. y n — {j/i , j/2, ■ • • , V\y\ n }■ If we want to communicate at rate R, the message set is Ai n = {1,2,..., 2 nR }. 
The message reproduction set A4 n is the same as A4 n . A deterministic encoder is a map e n : M 71 — » X n and similarly, a 
deterministic decoder is a map d n : y n — > M n . Deterministic encoder-decoders will be denoted as d-encoder-decoders. A 
stochastic-coupled sc-encoder-decoder is the same as a random code. The encoder comes from a family of codes and the 
decoder has access to the realization of the encoder through common randomness — that is the encoder and decoder have 
access to a shared random variable of sufficient entropy. We do not worry here about how much common randomness is used. 
For a given block length, stochastic-coupled encoder-decoders will be denoted by (e n , d n ) and overall by (e, d) =< e n ,d n >J°. 

Universal capacity: Consider a compound set of channels A. Consider a uniform distribution M n on Ai n so PM n {m) = 
2^7rVm G A4 n . Each composition of the M n , encoder, channel from A and decoder results in an output random variable 
M n on M. n . This induces a joint probability distribution P Mn ^ n on the message-message reproduction space Ai n x A4 n . 
Rate R is universally achievable over A under the average block error probability criterion if there exist encoder-decoder pairs 
such that under this joint probability distribution, Pr(M™ ^ M n ) — > as n — > oo for each channel in A. The randomness 
of the message and the randomness in the encoder-decoder are presumed to be independent of the channel. The supremum of 
achievable rates is called the universal channel capacity C SC (A). 



2 nR messages M. n . Uniform distribution M n on M n — "(^) *" 




~(d")-^M n 


\ rate R reliably achievable iff Pr(M n ^ M) 


-> as n - 


-> ooVc G A 



The channel set A can be interpreted as an adversary and in particular Cx.d is an adversary about which something specific 
is known. One can ask the question of universal capacity of A by restricting the set of encoders and decoders to be d or sc, 
and in general, one will get two different answers. Error criteria different from Pr(Af n ^ M n ) — * as n — + oo, also exist, 
but they will not be considered in this paper. 
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Source-code and operational rate-distortion function: The source-coding problem is to code an i.i.d. X source to within 
a distortion level D in the sense of (Q~|) while using the smallest rate possible to do so. The goal is to find a deterministic 
mapping whose output has the minimum cardinality and hence the smallest possible rate. See [9] for a precise statement. The 
minimum possible rate is called the operational rate-distortion function and is denoted by Rx(D). 

The two problems between which we will see that there is a close connection: 

• Universal Capacity of Cx,d, and 

• Source coding i.i.d. X to within a distortion level D 

That the set of all (potentially random) source-codes which code an i.i.d. X source to within a distortion level D is the same 
os Cx,d is the reason why these two questions are closely connected. 

III. C sc (Cx.d) = Rx{D): CONNECTION BETWEEN SOURCE AND CHANNEL-CODING 

Theorem 3.1: C sc (Cx.d) = Rx(D) 

Proof: First proved in [7]. We give another proof here that directly shows the close connections between the source-coding 
and channel-coding questions. The proof consists of two steps: 

• A rate-distortion source-code can be interpreted as a particularly "bad" channel. The capacity of this "bad" channel is 
capped at R x (D) by a simple cardinality bound. Thus, C sc (Cx.d) < Rx{D). 

• There is a random coding-scheme for which rates < a are achievable for Cx.d- Since there might be another scheme 
which performs even better, C sc (Cx.d) > Q- 

Similarly, there is a coding-scheme for which rates > a are achievable for the source-coding problem. There might be 
another scheme which performs even better and so Rx(D) < a. Thus, Rx{D) < a < C sc (Cx.d)- 

For Rx(D) > C sc (Cx,d), only a little more detail is needed. Consider a "good" rate-i?x(-D) source-code. Now this source- 
code is a channel G Cx.d with no more than 2 nRx ( D * > possible outputs. Thus, the capacity of this channel < Rx(D) because 
if we try to communicate at rate > Rx(D), "many" codewords will get mapped to the same output sequence. This argument 
can be made precise (but longer) using standard techniques and proves C sc (Cx,d) < Rx(D). 

Next, we prove Rx(D) < C sc (Cx.d) using parallel random coding arguments, placing those for channel-coding and source- 
coding side by side to see the connection. See below: 



C sc {Cx,d) 


Rx{D) 


Achievability 


Achievability 


Codebook generation: 


Codebook generation: 


Generate 2 nH codewords i.i.d. px 


Generate 2 nH codewords independently of 




each other, each with precise type qy 




[qy is some prob. distribution on y) 


This is the codebook IC. Note: Codewords G X n 


This is the codebook C. Note: Codewords 6 y n 


Joint Typicality: Let e > 0. (x, y) jointly typical if 


Joint Typicality: Let e > 0. (x, y) jointly typical if 


i. x e-typical: p x G T(p x ,e) 


i. x e-typical: p x G T(p x ,e) 

n 


1 " 

ii. -V £*(*(<), y(i))<D 
i=i 


ii. - Vd(ar(t),y(i)) <D 
»=l 




iii. q y = qy 


y is output of a channel G Cx.d 


y is generated with precise type qy 


Thus, there is no restriction on q y or q y i x 


Thus, iii above is redundant. 


i£>C will denote transmitted codeword 


x G X n will denote sequence to be source-coded 


y G y n will denote received sequence 


y G L will denote a codeword 


z G JC will denote non-transmitted codeword 




Decoding strategy 


Encoding strategy 


If 3 unique x € IC such that (x, y) jointly typical 


If 3 some y G C such that (x,y) jointly typical 


declare x is transmitted. Else declare error. 


encoder x to one such y. Else declare error. 


Error events 


Error events 


£\. (x,y) not e-jointly typical 


T\. x not e-typical 


£2' 3z 7^ x G IC such that 


J~2 : py G £ such that 
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(z, y) e-jointly typical 

Pr(£i) — > as n — > oo by Cx.d defn. 
Analysis of Pr(<?2): 
z is generated i.i.d. px, 
independently of y 

The calculation required is the following: 

fix type of y, the output type to be qy. 

Calculate probability that (z,y) jointly typical 

given that x is typical 

Take worst case over qy 

Worst case: maximize error probability 

Thus, as e — > 0, qy for both problems is same 

Thus, answer to both calculations is same: call it F(n) 

Now, take a bound for whole codebook 

If (1 — F{n)) 2 — » 1 as n — > oo, rate R is achievable 



(x, y) e-jointly typical given x e-jointly typical 

Pr(.Fi) -► as n -> oo by WLLN. 
Analysis of Prf^): 
y is generated with precise type qy 
independently of x 

The calculation required is the following: 
fix type of y, the output type to be qy. 
Calculate probability that (z,y) jointly typical 
given that x is typical 
Take best case over qy 

Best case: maximize probability that encoding is possible 
Thus, as e — > 0, qy for both problems is same 

Thus, answer to both calculations is same: call it F(n) 

Now, take a bound for whole codebook 

If (1 — F(n)) 2 — > as n — > oo, rate R is achievable. 



Now, it turns out that (1 — F(n)) 2 exhibits a tight phase-transition as n gets large. Make R a little bigger and it goes to 
and a little smaller and it goes to 1. It follows that there is a threshold a such that all rates < a are achievable for the 
channel-coding problem and all rates > a are achievable for the source-coding problem. Thus, Rx(D) < a < C sc (Cx,d)- B 

Notice that this argument does not have to do any calculations for either capacity or the rate-distortion function. We just use 
the operational definition of capacity as the maximum rate of reliable communication and the operational definition of the 
rate-distortion function as the minimum rate required to source-code X to within a distortion D. 



IV. Universal source-channel separation theorem for rate-distortion assuming common randomness 

In this section, we prove a universal source-channel separation theorem in the rate-distortion context, where universality is 
over the channel. We also see an operational, direct view of source-channel separation for rate-distortion. 

Universal lossy communication to within a distortion D over a channel set A: Channel set A is said to be capable of 
universally communicating an i.i.d. X source to within a distortion level D if there exist encoder-decoders e =< e n >J°, < 
d n >J° such that all the composite channels (the composition of encoder, channel from A, and decoder) < d n o c o e n >J°, 
directly communicate an i.i.d. X source to within a distortion D, for all c =< c™ >j X) G A. In other words, do co e G Cx.d 
for all c e A. 



X" 



Pr [i 2"=i d(X n (i),Y n (i)) > D] -> as n -> ooVc 6 A 



The composite channel set {d o c o e : c G .4} will be denoted by d o A o e. 

Theorem 4.1 (Universal source-channel theorem for rate-distortion (USCS)): Assuming there is common randomness, in order 
to communicate i.i.d. X to within a distortion level D universally over a channel set A, it is sufficient to consider architectures 
which first source code the source to within a distortion level D followed by universal reliable channel coding over A. 

Proof: Let A be a channel set. Consider the following three statements. 

Si : C SC (A) > Rx{D). SI : C SC (A) > Rx(D). S2 ■ A is capable of universally communicating an i.i.d. X source to 
within a distortion D using an sc-encoder-decoder. 

Proof of Si 52 is the usual argument of source-coding followed by channel-coding. Roughly, source-code the i.i.d. X source. 
The output of the source-code is a message set of cardinality 2 nRx ^ D ' ) with a probability distribution on it. Communicate the 
message universally, reliably over A with an sc-encoder-decoder. This proof is rough, but since everything involved is standard, 
a precise proof is omitted. This completes the proof of S% =>- 82- 
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To prove 52 =4> S*: We will keep refering to the figure below which gives a step by step view of the argument. A (black 
rectangle) is capable of universally communicating i.i.d. X to within a distortion D with an sc-encoder-decoder. That is, 3 



sc encoder-decoder e n 



--< e" > 



=< d™ >]° such that Vc =< c n >fE A, the composite channel d a ° c 



< rf™ o c™ o ejj 1 >J° directly communicates an i.i.d. X source to within a distortion D for all c <E A. That is, d a o c o e a <E 
Cx,dVc 6 A More compact way of saying this is that channel set Ca = d„ o A o e„ C Cx,d (yellow rectangle). Thus, by 
Theorem 13.11 C ar -(C a) > Rx{D). So there exists an sc-encoder-decoder e b =< e b l >f,db —< d% >f such that with this 
encoder-decoder, there is reliable communication across Ca (magenta rectangle). Now, <4 o Ca ° e& = db ° d a o A o e a o e b = 
(db o d a ) o A o (eb o e a ) — df o Ao ef where df = (db o d a ) and e/ = (e& o e a ) is an sc-encoder-decoder pair (red rectangles) 
that achieve universal reliable communication over A. Thus, C SC (A) > Rx(D)- This proves 5*2 => S*. Theorem 14. 1 1 follows 





common randomness 



VceA,iim n 



e/ = e b o e a 



common random nes: 
common random nes; 



A 




,Y n (i))>D]=t} 

d f = d b o d a 



Universal reliable communication V rates < Rx (D) 



One gets the following layered architecture for reliable communication: an architecture for reliable communication built "on 
top of" an architecture for communication to within a distortion D. See figure above. d a ° c o e a is the architecture for 
communication to within a distortion D (blue rectangle), and an architecture for reliable communication, using encoder- 
decoder eb, db is built "on top of" it (magenta rectangle). This is a "direct" reduction of the reliable communication problem 
to the problem of 'communication to within a distortion D.' The view is operational because the proofs of Theorems 13.71 and 
\4.1\ are operational. We believe that the USCS perspective might be useful in network problems. 



V. Conclusion 

The results in this paper imply that there are natural equivalence relationships among communication problems. Here, the 
equivalence is shown by explicit reductions from one problem to another in a spirit analogous to [10]. In light of this, the 
traditional mutual-information characterization of rate can be viewed as a kind of key "invariant" that labels the equivalence 
classes. The implication is that if common-randomness is available, then there is nothing sacred about the traditional layering: 
source-coding followed by reliable communication over channels. Instead, the inner layer could just as well be something that 
is only guaranteed to communicate e.g. an asymmetric ternary source with P(a) — 2P(b) — 3P(c) = | to within Hamming 
distortion i. There will be no loss of optimality by forcing this seemingly bizarre architecture. 

However, it turns out that the common-randomness is really critical: in general for any Theorem [3J] style universal reduction of 
a reliable communication problem to one with non-zero distortion, a significant amount of common-randomness is required [11]. 
This suggests that there might be something special about the traditional layering after all: no additional common-randomness 
is required if the inner layer gives a reliable communication guarantee. Furthermore, it suggests that there might be other 
interesting "invariants" out there besides the rate-distortion function even for the simple stationary memoryless sources with 
additive distortion measures considered here. 
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